Add per-phase Prometheus analysis and consistent phase logging to loa…#179
Conversation
6823578 to
9f5568d
Compare
9f5568d to
0987ed7
Compare
|
Warning Review limit reached
More reviews will be available in 51 minutes and 5 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses rolling per-developer review limits. Reviews become available again as older review attempts age out of the rolling limit window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThe PR introduces a shared ChangesThree-phase ACM workload orchestration with Prometheus analysis
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@acm-deploy-load/acm-telco-core-load.py`:
- Around line 298-302: The idle baseline and soak baseline Prometheus analysis
launches are gated on `cliargs.no_deploy == False` at lines 298 and 429
respectively, but the Policy-only mode logging at lines 253-254 indicates
Prometheus analysis should run. To make baseline analysis mode-independent as
suggested, remove the `cliargs.no_deploy == False` condition from both the
launch_prometheus_analysis calls for the idle baseline and soak baseline windows
so they execute regardless of deployment mode. Keep the Phase 2 batch analysis
behind the deploy gate since it depends on actual deployment occurring.
In `@acm-deploy-load/utils/analysis.py`:
- Around line 38-50: The command list construction and the logger.info call that
logs the command using " ".join(cmd) are currently placed before the try block.
Move both the cmd list definition and the logger.info statement inside the try
block so that if kubeconfig is unset or non-string, any resulting error will be
properly caught by the exception handler instead of bypassing the controlled
error handling path.
In `@acm-deploy-load/utils/output.py`:
- Around line 228-285: The code checks three conditional branches for different
combinations of no_deploy and no_policy flags but does not handle the case where
both cliargs.no_deploy and cliargs.no_policy are True. This causes phase2_label
to remain unassigned in that scenario, leading to an UnboundLocalError when
phase2_label is referenced in the log_write call at line 285. Add an additional
elif or else branch after the three existing conditions to explicitly handle the
case where both flags are True, and assign an appropriate value to phase2_label
in that branch.
In `@scripts/interval-ztp-install-all.sh`:
- Around line 12-33: The wrapper in interval-ztp-install-all.sh is not passing a
single kubeconfig value through both deploy and Prometheus analysis flows, so
the two steps can target different clusters. Update the script’s main
phase-argument wiring and the deploy/analyzer invocations to accept and reuse
the same kubeconfig variable consistently, using the existing phase option
handling around the wrapper logic and analyzer call sites.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 7f27caf6-ea5b-4e32-afb4-ca8c16ac926c
📒 Files selected for processing (7)
README.mdacm-deploy-load/acm-deploy-load.pyacm-deploy-load/acm-telco-core-load.pyacm-deploy-load/utils/analysis.pyacm-deploy-load/utils/output.pyacm-deploy-load/utils/ztp_monitor.pyscripts/interval-ztp-install-all.sh
0987ed7 to
ccb26fc
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
acm-deploy-load/utils/output.py (1)
228-285: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winAdd the missing
no_deploy=True+no_policy=Truebranch before usingphase2_label.
phase2_labelis uninitialized in the both-true mode and then used at Line 285, causingUnboundLocalErrorduring report generation.Suggested fix
elif cliargs.no_deploy == True and cliargs.no_policy == False: phase2_label = "Policy Updates" log_write(report, " * Mode: Policy configmap updates only") log_write(report, " * Phase 1 / Idle Baseline (Start delay): {}s :: {}".format( cliargs.start_delay, str(timedelta(seconds=cliargs.start_delay)))) log_write(report, " * Phase 2 ({}):".format(phase2_label)) log_write(report, " * Update policy configmap ({} keys) in namespace {} per {}s interval".format( cliargs.hub_policy_cm_keys, cliargs.hub_policy_namespace, cliargs.interval_policy)) log_write(report, " * Maximum number of policy intervals to run: {}".format(cliargs.max_policy_intervals)) log_write(report, " * Phase 3 / Soak Baseline (End delay): {}s :: {}".format( cliargs.end_delay, str(timedelta(seconds=cliargs.end_delay)))) + else: + phase2_label = "No active workload" + log_write(report, " * Mode: Idle baseline + soak baseline only") + log_write(report, " * Phase 1 / Idle Baseline (Start delay): {}s :: {}".format( + cliargs.start_delay, str(timedelta(seconds=cliargs.start_delay)))) + log_write(report, " * Phase 2 ({}): no deploy/policy actions".format(phase2_label)) + log_write(report, " * Phase 3 / Soak Baseline (End delay): {}s :: {}".format( + cliargs.end_delay, str(timedelta(seconds=cliargs.end_delay))))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@acm-deploy-load/utils/output.py` around lines 228 - 285, The variable phase2_label is initialized in three conditional branches checking combinations of cliargs.no_deploy and cliargs.no_policy, but the fourth case where both are True is missing, causing an UnboundLocalError when phase2_label is used later at line 285. Add an elif branch for the case where cliargs.no_deploy == True and cliargs.no_policy == True, and initialize phase2_label with an appropriate descriptive label (such as "No Operations" or similar) for this scenario to ensure the variable is always defined before being used in the subsequent log_write calls.acm-deploy-load/utils/analysis.py (1)
38-50: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winMove command logging/build into guarded path and validate kubeconfig before join.
At Line 48,
" ".join(cmd)can throw ifkubeconfigis not a string, and that happens before thetry, so the function can crash instead of hitting the controlled warning path.Suggested fix
- cmd = [ - sys.executable, - analyzer_script, - "-k", kubeconfig, - "-s", start_str, - "-e", end_str, - "-b", "0", - "-p", phase_name, - report_dir, - ] - logger.info("Prometheus analysis command: {}".format(" ".join(cmd))) log_file = os.path.join(report_dir, "pa-{}.log".format(phase_name)) try: + if not isinstance(kubeconfig, str) or not kubeconfig: + logger.warning("Skipping prometheus analysis phase {}: kubeconfig is not set".format(phase_name)) + return + cmd = [ + sys.executable, + analyzer_script, + "-k", kubeconfig, + "-s", start_str, + "-e", end_str, + "-b", "0", + "-p", phase_name, + report_dir, + ] + logger.info("Prometheus analysis command: {}".format(" ".join(cmd))) with open(log_file, "w") as f: proc = subprocess.Popen( cmd,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@acm-deploy-load/utils/analysis.py` around lines 38 - 50, The command logging at line 48 using the join operation happens before the try block, which means if kubeconfig is not a string, the join will fail and crash the function instead of being handled by the exception handler. Move the logger.info call that logs the command and the log_file assignment inside the try block so they are guarded by exception handling. Additionally, validate that kubeconfig is a string before constructing the cmd list to prevent the join operation from failing due to invalid kubeconfig type.
🧹 Nitpick comments (1)
acm-deploy-load/acm-deploy-load.py (1)
338-345: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winValidate kubeconfig before running AAP detection.
Running
detect_aap_install(...)beforevalidate_kubeconfig(...)can log a misleading “AAP install not detected” when kubeconfig is invalid, then fail immediately after. Validate first, then run detection.Suggested reorder
- # Detect AAP install - if detect_aap_install(cliargs.kubeconfig, cliargs.dry_run): - logger.info("AAP install detected, waiting for playbook completion") - cliargs.wait_playbook = True - else: - logger.info("AAP install not detected") - - # Validate parameters - validate_kubeconfig(cliargs.kubeconfig) + # Validate parameters + validate_kubeconfig(cliargs.kubeconfig) + + # Detect AAP install + if detect_aap_install(cliargs.kubeconfig, cliargs.dry_run): + logger.info("AAP install detected, waiting for playbook completion") + cliargs.wait_playbook = True + else: + logger.info("AAP install not detected")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@acm-deploy-load/acm-deploy-load.py` around lines 338 - 345, Reorder the validation logic so that validate_kubeconfig is called before the detect_aap_install check. Move the validate_kubeconfig(cliargs.kubeconfig) call to execute first, before the if block that calls detect_aap_install. This ensures the kubeconfig is validated as valid before attempting AAP detection, preventing misleading "AAP install not detected" log messages when the kubeconfig is actually invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@acm-deploy-load/utils/analysis.py`:
- Around line 38-50: The command logging at line 48 using the join operation
happens before the try block, which means if kubeconfig is not a string, the
join will fail and crash the function instead of being handled by the exception
handler. Move the logger.info call that logs the command and the log_file
assignment inside the try block so they are guarded by exception handling.
Additionally, validate that kubeconfig is a string before constructing the cmd
list to prevent the join operation from failing due to invalid kubeconfig type.
In `@acm-deploy-load/utils/output.py`:
- Around line 228-285: The variable phase2_label is initialized in three
conditional branches checking combinations of cliargs.no_deploy and
cliargs.no_policy, but the fourth case where both are True is missing, causing
an UnboundLocalError when phase2_label is used later at line 285. Add an elif
branch for the case where cliargs.no_deploy == True and cliargs.no_policy ==
True, and initialize phase2_label with an appropriate descriptive label (such as
"No Operations" or similar) for this scenario to ensure the variable is always
defined before being used in the subsequent log_write calls.
---
Nitpick comments:
In `@acm-deploy-load/acm-deploy-load.py`:
- Around line 338-345: Reorder the validation logic so that validate_kubeconfig
is called before the detect_aap_install check. Move the
validate_kubeconfig(cliargs.kubeconfig) call to execute first, before the if
block that calls detect_aap_install. This ensures the kubeconfig is validated as
valid before attempting AAP detection, preventing misleading "AAP install not
detected" log messages when the kubeconfig is actually invalid.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 38857c1c-70cd-4ab7-b00a-efb24f3ec9ba
📒 Files selected for processing (8)
README.mdacm-deploy-load/acm-deploy-load.pyacm-deploy-load/acm-telco-core-load.pyacm-deploy-load/utils/analysis.pyacm-deploy-load/utils/common_ocp.pyacm-deploy-load/utils/output.pyacm-deploy-load/utils/ztp_monitor.pyscripts/interval-ztp-install-all.sh
… load tools Extract launch_prometheus_analysis() into shared utils/analysis.py module used by both acm-deploy-load.py and acm-telco-core-load.py. Add phased workload structure (Idle Baseline, Cluster Deployment, Soak Baseline) with automatic background Prometheus analysis at phase boundaries for resource consumption comparison testing. Align logging, report generation, and section naming across both load tools. - Add -k/--kubeconfig and --no-prometheus-analysis CLI args to acm-deploy-load.py - Run 3 background analyses: idle-baseline, cluster-deployment, soak-baseline - Add periodic countdown logging during idle and soak baseline phases - Consolidate workload parameter logging into structured block matching telco-core pattern - Move telco-core inline report to generate_telco_core_report() in utils/output.py - Rename generate_report() to generate_deploy_load_report() for clarity - Rename "Deployed Cluster Orchestration" to "Workload Parameters" in report - Align report section names: Workload Parameters, Workload Duration Results, Workload Phases - Enrich deploy-load report Workload Parameters with phase durations, wait configs, and cluster details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ccb26fc to
c9fa893
Compare
Extract launch_prometheus_analysis() into shared utils/analysis.py module
used by both acm-deploy-load.py and acm-telco-core-load.py. Add phased
workload structure (Idle Baseline, Cluster Deployment, Soak Baseline) with
automatic background Prometheus analysis at phase boundaries for resource
consumption comparison testing. Align logging, report generation, and section
naming across both load tools.